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ABSTRACT 

The genome-wide distribution patterns of the '6th 
base' 5-hydroxymethylcytosine (5hmC) in many 
tissues and cells have recently been revealed 
by hydroxymethylated DNA immunoprecipitation 
(hMeDIP) followed by high throughput sequencing or 
tiling arrays. However, it has been challenging to 
directly compare different data sets and samples 
using data generated by this method. Here, we 
report a new comparative hMeDIP-seq method, 
which involves barcoding different input DNA 
samples at the start and then performing hMeDIP- 
seq for multiple samples in one hMeDIP reaction. 
This approach extends the barcode technology from 
simply multiplexing the DNA deep sequencing 
outcome and provides significant advantages for 
quantitative control of all experimental steps, from 
unbiased hMeDIP to deep sequencing data analysis. 
Using this improved method, we profiled and 
compared the DNA hydroxymethylomes of mouse 
ES cells (ESCs) and mouse ESC-derived neural pro- 
genitor cells (NPCs). We identified differentially 
hydroxymethylated regions (DHMRs) between ESCs 
and NPCs and uncovered an intricate relationship 
between the alteration of DNA hydroxymethylation 



and changes in gene expression during neural 
lineage commitment of ESCs. Presumably, the 
DHMRs between ESCs and NPCs uncovered by this 
approach may provide new insight into the function of 
5hmC in gene regulation and neural differentiation. 
Thus, this newly developed comparative hMeDIP-seq 
method provides a cost-effective and user-friendly 
strategy for direct genome-wide comparison of DNA 
hydroxymethylation across multiple samples, lending 
significant biological, physiological and clinical 
implications. 



INTRODUCTION 

DNA methylation at the 5-position of cytosine (5mC) is 
an important epigenetic modification that plays crucial 
roles in mammalian development and cell differentiation 
(1). Recent studies show that this apparently stable modi- 
fication can be removed via oxidation by the ten-eleven 
translocation (TET) family of proteins. TET proteins 
oxidize 5mC to 5-hydroxymethylcytosine (5hmC), 
5-formylcytosine (5fC) and 5-carboxylcytosine (5caC) 
(2-4). Among the three 5mC oxidative derivatives, 
5hmC is the most abundant form in vivo and can be 
detected in almost all mammalian tissues and cells (4-8). 
In addition to being one of the intermediate states of 
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DNA demethylation, 5hmC is now being considered an 
epigenetic modification (9-11). 

Genome-wide mapping of 5mC and 5hmC reveals the 
genomic locations of these modifications, which is im- 
portant for the elucidation of their functions (10,11). 
Several groups have used 5hmC-specific affinity 
pull-down techniques [for example, hydroxymethylated 
DNA immunoprecipitation (hMeDIP)] followed by 
next-generation sequencing (NGS) or tiling arrays to 
profile the genome-wide 5hmC distribution in brain 
tissues and embryonic stem cells (12-18). However, due 
to a few limitations buried in the conventional 
hMeDIP-seq method, it remains challenging to use 
DNA hydroxymethylome data generated by conventional 
hMeDIP-seq to perform direct genome-wide comparative 
analysis between high- and low-5hmC samples. For 
example, the current standard protocol of Illumina 
NGS requires loading of the same amount of DNA 
libraries for different samples in cluster generation and 
sequencing, which cancels out the differences in hMeDIP 
products among different samples (especially for those 
with differential 5hmC abundance). In addition, the 
multiple steps of the conventional hMeDIP-seq method, 
including hMeDIP, library construction, cluster hybrid- 
ization and sequencing, may amplify experimental vari- 
ation across the samples. Many laboratories have 
developed elegant computational methods to normalize 
or adjust the DNA methylome data generated from dif- 
ferent samples (19-22). However, the bias caused by the 
intrinsic limitations of the conventional MeDIP-seq or 
hMeDIP-seq methods has not yet been overcome. The 
dramatic variation of 5hmC levels among distinct tissues 
and cells or during cell differentiation and embryo devel- 
opment demands new strategies or approaches to 
overcome these technical difficulties (6,23). 

We reasoned that if the multiple steps required for 
hMeDIP-seq can be performed in one reaction system 
with multiple samples being compared, the experimentally 
induced variation could be significantly reduced, therefore 
overcoming the aforementioned limitation of the conven- 
tional hMeDIP-seq. To address this issue, we applied 
barcode technology (multiple index sequencing), which 
has recently been developed to distinguish different 
DNA samples in next-generation sequencing (24,25). 
The sequence reads of hMeDIP, as well as input, can be 
sorted into several groups representing different samples 
based on their specific barcode, which is assigned to each 
sample before performing the hMeDIP. Importantly, this 
modified hMeDIP strategy makes the obtained hMeDIP- 
seq data from multiple samples with different levels of 
5hmC accessible for accurate comparative analysis. This 
approach has been successfully used to compare the 
hydroxymethylomes of the benign nevus and melanoma 
(26). In this report, we first describe the methodology of 
this approach in detail and then demonstrate its broad 
application in understanding the functional role of 
5hmC during neural differentiation. In particular, we 
identified the genome-wide differentially hydroxymeth- 
ylated regions (DHMRs) between mouse embryonic 
stem cells (ESCs) and neural progenitor cells (NPCs), 
and analysed their correlation with changes in gene 



expression. Together, our studies reveal dynamic 
changes in DNA hydroxymethylation during neural dif- 
ferentiation of ESCs and suggest that this barcode-based 
comparative hMeDIP-seq method can be used to compare 
the 5hmC patterns of multiple biological or pathological 
samples. 

MATERIALS AND METHODS 

Cell culture 

46C mouse ES cells (generously given by Dr Qi-Long 
Ying, USC) were cultured in high-glucose Dulbecco's 
modified Eagle's medium (DMEM) (Hyclone- 
ThermoScientific) supplemented with 10% fetal bovine 
serum (Invitrogen), 0.1 mM non-essential amino acid 
(Invitrogen), 0.1 mM (3-mercaptoethanol (Invitrogen) 
and 1000 U/ml leukaemia inhibitory factor (Millipore). 
ESC-derived NPCs were differentiated from 46C ES 
cells according to the monolayer neural differentiation 
protocol (27,28). Rosette-like Soxl-GFP+ NPCs (>95% 
Soxl-GFP+ NPCs) were used in our experiments at day 9 
upon neural differentiation. 

Immunofluorescence 

Cells were fixed in 4% paraformaldehyde and penetrated 
using PBS with 0.1% Triton X-100. Cellular DNA was 
denatured for 30min by 2N HC1 followed by neutraliza- 
tion for 5min using 100 mM Tris, pH 8.0. After blocking 
in 5% serum for 1 h, cells were incubated with anti-5hmC 
antibody (1:5000) (active motif). After washing, cells were 
incubated with Cy3-labelled anti-rabbit IgG (beyond 
time), and the nuclei were labelled by DAPI. 
Fluorescence was observed and recorded using Leica 
Microscopy. 

Reverse transcription-quantitative PCR (RT-qPCR) 

RNA was extracted using Trizol reagent (Invitrogen) ac- 
cording to the manufacturer's protocol. One microgram 
RNA was used for reverse transcription by kit 
(TAKARA). The 20 ul cDNA was diluted into 200 ul. 
The transcript levels of Tetl, Tet2 and Tet3 were 
measured using quantitative real-time PCR with condi- 
tions: 95°C for 2min followed by 40 cycles at 95°C for 
15 s and 60°C for 31s. Quantitative real-time PCR was 
carried in an iQ5 real-time PCR cycler (Bio-Rad) using 
the premixed 2x real-time SYBR Green reagent 
(TAKARA). Sequences of primers are described in 
Supplementary Table SI. 

Dot blot analysis 

Genomic DNA was extracted from cells using DNeasy 
Blood and Tissue kit (Qiagen). Denatured DNA was 
spotted on a nitrocellulose membrane (Waterman) and 
cross-linked by UVC (ultraviolet C) irradiation (Hoefer) 
for 5min. The membrane was then blocked with 5% milk 
in TBS-Tween 20 for 1 h and incubated with anti-5hmC 
antibody (1:10000) (active motif) at 4°C overnight. After 
incubation with a horse radish peroxidase (HRP)- 
conjugated anti-rabbit IgG (beyond time) for 1 h at 
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room temperature, the membrane was washed with 
TBS-Tween 20 three times and then DNA was detected 
by Western blotting analysis system (Kodak) using 
enhanced chemiluminescence (GE-Healthcare). The dot 
blot intensity was quantified by Image-J software (NIH). 

Comparative hMeDIP-seq 

Library construction (adding adaptor containing 
barcode sequence) 

Genomic DNA was sonicated to < 500 bp by Bioruptor 
sonicator (Diagenode) and quantified using the 
Quant-iTdsDNA HS assay kit (Invitrogen) according to 
the manufacturer's manual. The sonicated DNA frag- 
ments (4ug of each sample) were end-repaired using the 
End-It DNA End Repair Kit (EPICENTRE Biotech- 
nologies) according to the manufacturer's instructions, 
followed by treatment with Klenow fragment 3'-5'exo 
(NEB) and dATP to generate a protruding 3'A for 
ligation with the adaptor containing a specific barcode 
sequence. The barcode sequence (four-base index: 
CCAG and TAGC) within the adaptor was designed as 
described previously (24) with some modifications. 

hMeDIP 

Equal amounts (~4ug) of barcode-tagged gDNA from 
ESCs and NPCs were pooled together in one tube. The 
mixed DNA was denatured and diluted by IP buffer (10% 
was taken off as input at this step). The denatured DNA 
was incubated with 4 ul anti-5hmC antibody (active motif) 
at 4°C overnight. Antibody-DNA complexes were 
captured by protein A/G beads, and the enriched 
5hmC-containing DNA fragments (hMeDIP product) 
were purified. 

Library amplification 

The hMeDIP products, as well as input DNA, were 
amplified for 10-12 cycles, and the PCR products were 
purified by Qiagen Mini Gel Purification Kit. 

Next-generation sequencing 

The amplified libraries were submitted for cluster growth 
and sequencing by the Illumina Genome Analyzer II 
(GAII). 

hMeDIP-seq data processing 

The image analysis and base calling were performed with 
the Illumina package OLB (vl.8). We separated the raw 
sequence reads of hMeDIP and Input into different files, 
according to the specific barcode sequences (4-bp) at 5'- 
end of each sequence read. 

Sequence reads were mapped onto the reference mouse 
genome (NCBI Build UCSC mm9) using the Bowtie 
(vO.12.7) algorithm (29). Unique and monoclonal reads 
were used for further analysis. Since the average DNA 
fragment length used in hMeDIP-seq was 350 bp, each 
sequence was extended to 350 bp. Summary of 
hMeDIP-seq data is shown in Supplementary Table S2. 
The sequencing data have been deposited in the Gene 
Expression Omnibus database under accession number 
GSE40810. 



The distribution of 5hmC reads at promoters or in gene 
body regions was analysed by house-made software (13). 
The 5hmC-enriched regions (5hmC peaks) were identified 
using MACS (vl.4) at P < le-5 and FDR < 0.01 (30). The 
5hmC peaks were annotated to mouse Refseq genes and 
the genes with 5hmC peaks at promoters or in gene body 
regions were chosen for further analysis. 

Analysis of the relationship between DNA 
hydroxymethylation and gene expression 

Gene expression data of ESCs (31) and NPCs (32) were 
downloaded from GEO dataset [Accession number of 
GEO: ESCs (ES_repl: GSM198062, ES_rep2: 
GSM 198063, ES_rep3: GSM 198064), NPCs (NPC_repl: 
GSM618399, NPC_rep2: GSM618400)]. The raw CEL 
files were processed by robust multi-array average 
(RMA), and log 2 intensities were used to represent the 
expression levels after normalization between samples. 
Genes were separated into five groups with respect to 
their average signals of probes, and the 5hmC distribution 
at promoters and in gene body regions of these five groups 
of genes were plotted for ESCs and NPCs, respectively. 

The genes were also classified into '5hmC+' and 
'5hmC— ' groups according to the presence or absence of 
5hmC peaks at either promoters [TSS (transcription start 
site ) ± 1 kb] or gene body regions [from 'TSS + 1 kb' to 
'TES (transcription end site)']. The differences in average 
gene expression levels between these groups of genes were 
analysed by ANOVA. 

Identification of DHMRs 

Pooled 5hmC peaks were called from the un-separated 
hMeDIP-seq data by MACS (30), and the densities of 
5hmC peaks were compared between ESCs and NPCs. 
For the quantitative analysis, the densities of 5hmC 
peaks between ESCs and NPCs were normalized by 
their input respectively. The threshold was set at 1-fold 
(NPCs versus ESCs, log 2 5hmC density ratio > 1 
or<— 1) for identification of 'DNA de-hydro- 
xymethylation (loss of 5hmC)' regions and 'de novo 
DNA hydroxymethylation (gain of 5hmC)' regions 
during neural differentiation. 

hMeDIP-qPCR 

Two microgram sonicated gDNA was denatured and 
incubated with 1 ul anti-5hmC antiserum (active motif) 
for hMeDIP as described previously (13,33). Normal 
IgG antibodies were used as control. Input and hMeDIP 
products were used as templates for quantitative real-time 
PCR. Relative 5hmC enrichment was calculated by 2 dCt 
(dCt = Ct Input — Ct hMeDIP ). Primers used in hMeDIP- 
qPCR are described in Supplementary Table S3. 

Glucosylation of genomic DNA followed by 
methylation-sensitive qPCR (glucMS-qPCR) 

The '5hmC and '5hmC + 5mC levels in DHMRs contain- 
ing Mspl/Hpall sites were measured by a restriction 
enzyme-based assay (EpiMark kit, NEB) (34). Genomic 
DNA was treated with or without T4 Phage 
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P-glucosyltransf erase and then digested by Mspl, Hpall or 
no enzyme (mock digestion). The Mspl- and 
Hpall-resistant fraction was quantified by qPCR using 
primers designed around at least one Hpall/Mspl site, 
and normalizing to the mock digestion. Primers used in 
glucMS-qPCR are described in Supplementary Table S4. 

Gene ontology (GO) analysis 

GO analysis for genes with 'gain of 5hmC DHMRs was 
performed by the database for annotation, visualization 
and integrated discovery (DAVID) website (35,36). GO 
analysis for the 'gain of 5hmC DHMRs was also per- 
formed by the Genomic Regions Enrichment of 
Annotations Tool (GREAT) (37). 

Analysis of the relationship between the changes in 
DNA hydroxymethylation and gene expression during 
neural differentiation 

The genes were sorted according to the changes in gene 
expression level: genes that were down-regulated and 
genes that were up-regulated during neural differentiation 
(log 2 mRNA value <— 1 or >1). The changes in tag 
density of 5hmC peaks at promoters (TSS ± 1 kb) or in 
gene body regions (from TSS + 1 kb to TES) were 
compared between the up- and down- regulated genes by 
ANOVA. 

All 5hmC peaks located at promoters (TSS ± 1 kb) or 
gene body regions (from TSS + 1 kb to TES) were selected 
for dot plotting. The alteration in 5hmC density for each 
5hmC peak was plotted on the j-axis and the change in 
corresponding gene expression was plotted on the x-axis. 
Correlation between the changes in 5hmC and gene ex- 
pression was analysed by linear regression analysis. 



RESULTS 

Strategy for comparative hMeDIP-seq method 

To compare the DNA hydroxymethylomes of diverse cell 
types, we optimized the conventional hMeDIP-seq 
method in multiple aspects (as illustrated in Figure 1). 
For instance, after sonication, the genomic DNA frag- 
ments from different samples were labelled with 
adaptors containing specific barcode sequences. Equal 
amounts of barcode-tagged DNA fragments from differ- 
ent samples were then pooled together in one tube for 
hMeDIP (the input was taken out before IP). The 
hMeDIP and input libraries were amplified and sequenced 
using Illumina GAII. The raw reads from different 
samples were sorted according to their specific barcode 
sequences, and mapped to the genome for further bio- 
informatic analysis. Immunoprecipitation and sequencing 
of multiple samples in one reaction system markedly 
reduce the experimental variation among samples and 
enable direct comparison of the DNA hydroxy- 
methylation data across samples. Thus, we refer to 
this new approach as the 'comparative hMeDIP-seq 
method'. 



Reduced global 5hmC level in NPCs as compared 
with ESCs 

Epigenetic modifications undergo dynamic changes during 
cell differentiation and development. Directed differenti- 
ation of pluripotent mouse ESCs into NPCs in vitro is a 
well-established model for the study of molecular mech- 
anisms controlling neural induction. Therefore, we used 
the newly developed comparative hMeDIP-seq method 
to examine the genome-wide differences in DNA 
hydroxymethylation in ESCs and ESC-derived NPCs. A 
sufficient number of NPCs were obtained with high purity 
from mouse ESCs via N2B27 serum-free monolayer dif- 
ferentiation in vitro (Supplementary Figure Sla-c). We 
compared the expression levels of Tet family genes as 
well as the global 5hmC levels in ESCs versus NPCs. 
RT-qPCR analysis revealed that Tetl mRNA level, 
which was high in ESCs, decreased dramatically in 
NPCs, while Tet3 was up-regulated in NPCs but only 
marginally expressed in ESCs (Figure 2a). However, 
Tetl mRNA levels were comparable between ESCs and 
NPCs (Figure 2a). Both immunofluorescence (Figure 2b) 
and dot blot analysis (Figure 2c) showed that the global 
5hmC levels in NPCs were much lower than those in 
ESCs, suggesting that pluripotent ESCs and neural 
lineage-committed NPCs are in two different developmen- 
tal stages with high and low 5hmC abundance, respect- 
ively. As expected, comparative hMeDIP-seq data showed 
that the 5hmC reads of ESCs and NPCs (after normalized 
to the corresponding input) were in complete agreement 
with changes of the global 5hmC levels detected between 
ESCs and NPCs by dot blot (Figure 2d). These data 
indicate that the new comparative hMeDIP-seq method 
preserves the relative difference in 5hmC abundance 
across different samples. 

The average distribution of 5hmC was mapped onto an 
average gene model. ESCs exhibited a hypo-hydroxy- 
methylated 'valley' with a small 'bumper' around TSS 
regions, but significant 5hmC enrichment was found 
across gene body regions (Figure 2e), consistent with the 
previously reported 5hmC pattern in ESCs (13-15). 
However, neither a 5hmC dip with a small 'bumper' 
around TSS regions nor significant 5hmC enrichment in 
gene body regions was observed in NPCs (Figure 2f). This 
suggests that both the total level of 5hmC and the genomic 
distribution of 5hmC are dramatically different between 
ESCs and NPCs (38). 

Recent reports have suggested significant 5hmC enrich- 
ment at enhancers (defined by the enrichment of both 
H3K4mel and H3K27ac histone modifications) in ESCs 
(17,39—42). Therefore, we next examined the association 
of 5hmC with the annotated enhancers and DNase I hyper- 
sensitive sites (DHSs), as well as the loss of 5hmC at these 
newly defined genomic elements during ESC to NPC differ- 
entiation. Consistent with previous findings, we observed 
significant 5hmC enrichment at the enhancers in ESCs 
(Supplementary Figure S2a). Interestingly, this 5hmC 
enrichment at ESC enhancers is lost in NPCs 
(Supplementary Figure S2a). We also analysed the relation- 
ship of 5hmC and DHSs. We separated DHSs into three 
classes: (i) those lacking the enhancer histone modifications 
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Figure 1. Schematic diagram of the comparative hMeDIP-seq method. 



H3K4mel and H3K27ac; (ii) putative poised enhancers 
bearing only H3K4mel; and (iii) putative active enhancers 
with both modifications. DHSs with poised or active 
enhancer chromatin signatures exhibit 5hmC enrichment, 
and poised enhancers have even higher 5hmC levels than 
active enhancers (Supplementary Figure S2b-d). These ob- 
servations indicate that DNA hydroxymethylation may be 
involved in regulating the function of enhancers. 

It is noteworthy that the comparative hMeDIP-seq 
approach (two samples in one lane) generated fewer 
5hmC reads and peaks for individual samples (for 
example ESCs) than the conventional hMeDIP-seq 
approach (one sample in one lane). However, the 5hmC 
distribution patterns were consistent between the com- 
parative and conventional hMeDIP-seq methods 
(Supplementary Figure S3a-d), suggesting that the new 
comparative hMeDIP-seq method yields similar results, 
albeit at the expense of throughput. If higher throughput 
data is required, it would be necessary to sequence the 
library in more lanes and combine the data together for 
analysis. 

Correlation between DNA hydroxymethylation and 
gene expression status 

To determine whether there is a relationship between 
DNA hydroxymethylation and gene expression status in 
the same cell type, we classified genes into five groups 
based on their transcriptional levels and plotted their 
5hmC distribution patterns at promoters and gene body 
regions. As shown in Figure 3a, genes with high expression 
levels in ESCs exhibited 5hmC depletion around TSS 
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regions, whereas genes with low expression levels dis- 
played 5hmC enrichment at promoters. A similar 
negative correlation between 5hmC density around TSS 
regions and gene expression was also observed in NPCs 
(Figure 3b). Consistent with the previous findings, all five 
groups of genes have abundant 5hmC enrichment in gene 
body regions in ESCs, and genes with high expression 
levels displayed higher 5hmC enrichment than genes at 
low expression levels (Figure 3a). However, in NPCs, the 
5hmC curves of all five groups of genes could not be easily 
separated from each other because 5hmC was not particu- 
larly enriched in gene body regions (Figure 3b). 

We also separated genes into '5hmC+' and '5hmC— ' 
groups according to the presence or absence of 5hmC 
peak (s) at promoters (TSS ± 1 kb) or gene body regions 
(from TSS + 1 kb to TES) and compared their average 
gene expression levels. In ESCs, genes with 5hmC peaks 
at promoters (TSS ± 1 kb) had lower average expression 
levels than those without 5hmC peaks, while genes with 
5hmC peaks in the gene body regions (from TSS + 1 kb to 
TES) had higher expression levels than those without 
5hmC peaks (Figure 3c). In NPCs, genes with 5hmC 
peaks at promoters (TSS ± 1 kb) also had lower expres- 
sion levels than those without 5hmC peaks (Figure 3d). 
However, unlike the relationship observed in ESCs, genes 
with 5hmC peaks in gene body regions (from TSS + 1 kb 
to TES) in NPCs had lower expression levels than the 
genes without 5hmC peaks (Figure 3d). Thus, these data 
suggest that DNA hydroxymethylation in gene body 
regions correlates differently with gene expression depend- 
ing on the cell type. 

Identification of DHMRs between ESCs and NPCs 

Since the numbers and widths of 5hmC peaks were not 
consistent between two different samples, it was difficult 
to compare the peaks of ESCs and NPCs directly using the 
conventional hMeDIP-seq method. It is now possible to 
identify peaks from the un-separated (pooled) hMeDIP- 
seq data of ESCs and NPCs as the DNA hydrox- 
ymethylation-sensitive regions in both cells because the 
comparative hMeDIP-seq reactions for two different 
samples are performed in one reaction system. The 
density of pooled 5hmC peaks in ESCs and NPCs can 
then be directly compared after normalization by the 
ratio of corresponding input throughput. 

Overall, we identified 127 256 pooled 5hmC peaks from 
the un-separated hMeDIP-seq data (Figure 4a). Among 
them, 107 834 regions lose 5hmC (NPCs versus ESCs, 
log2 5hmC density ratio < — 1 ) during neural differenti- 
ation (Figure 4a). However, we also detected 2748 
regions that underwent L de novo DNA hydroxy- 
methylation (gain-of-5hmC)' (NPCs versus ESCs, log2 
5hmC density ratio > 1) (Figure 4a). Importantly, the 
DHMRs between ESCs and NPCs can be validated by 
hMeDIP-qPCR and glucMS-qPCR analysis using a rep- 
resentative set of DHMRs. As shown in Figure 4b-e, the 
5hmC peaks at Ankrd23 and Histlh2aa exhibited 'loss of 
5hmC upon neural differentiation, while the 5hmC peaks 
at Ftll and Irf2bp2 genes underwent 'de novo hydroxy- 
methylation (gain of 5hmC)'. 
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Figure 2. Differential Tetl/2/3 gene expression and global 5hmC levels in ESCs and NPCs. (a) RT-qPCR analysis of Tetl, Tet2 and Tet3 mRNA 
levels in ESCs and NPCs (mean values ± SD, n = 3). (b) Immunofluorescence analysis of 5hmC levels in ESCs and NPCs. Bar: 50 um. (c) Dot blot 
analysis of the global 5hmC levels in the gDNA of ESCs and ESC-derived NPCs. One hundred fifty nanogram gDNA for each dot. (d) The relative 
5hmC signal of dot blot and comparative hMeDIP-seq in ESCs and NPCs. *The ratio of hMeDIP/Input reads numbers in ESCs was set as 100%. 
(e) Distribution of 5hmC in TSS ± 5kb and gene body regions in ESCs. (f) Distribution of 5hmC in TSS ± 5kb and gene body regions in NPCs. 



The distributions of pooled 5hmC peaks, 'loss of 5hmC 
peaks and 'gain of 5hmC peaks in different genomic 
features were compared with the sequence coverage of 
these genomic features in the mouse genome. As previ- 
ously reported (13,16), the pooled 5hmC peaks are 
associated with exons and promoters with a much higher 
frequency than with introns and intergenic regions 
(relative to the percentage of these genome elements in 
proportion to the genome size) (Figure 4f). Strikingly, al- 
though the 'loss of 5hmC is a genome-wide feature of the 
neural lineage commitment, it is most dramatic in exons 
and promoters and less evident in introns and intergenic 
regions (Figure 4f), suggesting that the dynamic change of 
5hmC likely has a more profound effect on regulation of 
the promoters and exons during ESC differentiation. The 
'gain of 5hmC peaks show no apparent preference for the 



enrichment of these peaks in promoters, exons, introns or 
intergenic regions. Nonetheless, Gene Ontogeny (GO) 
analysis of the genes with de novo DNA hydroxy- 
methylation showed an enrichment of genes involved in 
'dendrite morphogenesis' and other neural system func- 
tions (Supplementary Figure S4 and Supplementary 
Table S5). Thus, the functional potential and mechanistic 
insight of the 'gain of 5hmC peaks during neural differ- 
entiation may warrant future investigation. 

Relationship between the dynamic changes of DNA 
hydroxymethylation and gene expression during neural 
lineage commitment of ESCs 

We next asked whether the alteration in DNA hydrox- 
ymethylation at promoters (TSS ± 1 kb) or gene body 
regions (from TSS + 1 kb to TES) is associated with the 
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changes in gene expression during neural lineage commit- 
ment of ESCs. We stratified genes into two groups: 
down-regulated (log 2 gene expression ratio <— 1) and 
up-regulated (log 2 gene expression ratio > 1) genes 
during neural differentiation (NPCs versus ESCs) and 
compared their changes in 5hmC density. Although the 
5hmC peaks at both groups of genes displayed a 
dramatic decrease during neural lineage commitment of 
ESCs, the 5hmC peaks at down-regulated genes decreased 
more than those at up-regulated genes (Figure 5a and b). 
Pooled 5hmC peaks at promoters or gene body regions 
were plotted according to their changes in 5hmC density 
and gene expression during neural lineage commitment of 
ESCs. The majority of 5hmC peaks exhibited a decrease in 
5hmC density upon neural lineage commitment of ESCs 
while a small fraction of 5hmC peaks showed an increase 
(Figure 5c and d). As a whole, there is a weak positive 
correlation between the changes in 5hmC peak density and 
corresponding gene expression during neural lineage com- 
mitment of the ESCs (Figure 5c and d). 



DISCUSSION 

The barcode technology that distinguishes DNA frag- 
ments of different samples allows performance of 
hMeDIP of multiple samples in one reaction system and 
sequencing the products in one lane. Although barcode 
technology has been widely used in high throughput 
sequencing of multiple samples and reduces cost in 
sequencing (24,25), this is the first time (to our knowledge) 
it has been applied as the first step of hMeDIP-seq. This 
new application achieves sequencing of multiple samples 
in one lane and preserves the 5hmC differences among the 
samples throughout the whole process of hMeDIP-seq 
(the latter is the most important innovation in this 
method). We could eliminate most of the systematic and 
technical errors that arise with conventional hMeDIP-seq 
method, since the experiment is processed in identical con- 
ditions and is internally controlled. The new method has 
several advantages: (i) Performing hMeDIP in one tube 
for several samples (especially samples with differential 
5hmC abundance) allows the equal IP efficiency across 
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compared samples; (ii) One reaction system with multiple 
samples for all steps after barcode tagging, including 
hMeDIP, library amplification, cluster formation and 
sequencing, can reduce accidental and systematic errors 
of the experiments; (iii) Sequencing multiple samples in 
one lane can reduce the cost; (iv) For the comparison 
across different samples by this method, it is not necessary 
to reach saturated sequencing coverage in any samples, 
although more hydroxymethylated loci could be identified 
with more sequencing coverage; and (v) Finally and most 
importantly (also see below), the comparative hMeDIP- 
seq data generated by this new approach is ready for direct 
comparative analysis across multiple samples after nor- 
malization of hMeDIP data by the corresponding input 
data of each sample. 

The new strategy used in the comparative hMeDIP-seq 
method greatly facilitates later data processing. The com- 
putational analysis for all the reads obtained from deep 
sequencing is processed as a whole for the different 
samples (ESCs versus NPCs) within the data collection. 
Thus, it is simplified, straightforward and unbiased for all 



reads alignment and all 5hmC peak calling. Indeed, we 
have made significant improvements in peak calling and 
DHMR identification. Pooled 5hmC peaks can be called 
from the un-separated hMeDIP-seq data of multiple 
samples as the sensitive DNA hydroxymethylated 
regions. Then, we can compare the 5hmC density at 
these sensitive DNA hydroxymethylated regions across 
multiple samples. With this new strategy, we have 
bypassed the problem of asymmetric peak widths and 
numbers between samples and significantly reduced the 
work load for the data processing. 

Recently, two new techniques (oxBS-seq and TAB-seq) 
have been reported to distinguish 5hmC from 5mC at 
single-base resolution (40,43). These two methods also 
provided means to detect the absolute quantitative value 
(percentage) of 5hmC in the genome. The single-base reso- 
lution 5hmC sequencing approach revealed that the 
absolute percentage of 5hmC in gene body regions 
appears to be low, contrary to previous data from the 
conventional hMeDIP-seq approach, which suggested 
that gene body regions of ESCs tend to have a large 
amount of 5hmC. Therefore, the conclusions drawn 
from the two different approaches seem contradictory. 
We believe that this discrepancy is likely caused by the 
different measurement techniques of the two 5hmC-de- 
tecting approaches. Gene body DNA (especially in 
exons) has a relatively high frequency of CpG content 
but low percentage of 5hmC at each CpG site. It is true 
that the TAB-Seq method represents a more accurate 
methodology to measure the absolute abundance of 
5hmC at base pair resolution. However, as pointed out 
by Yu et al. (40) in their Discussion, the TAB-seq 
method likely underestimates relatively low abundances 
of 5hmC sites within gene body, as these sites might 
have escaped detection because of insufficient coverage 
of sequencing depth. Accordingly, an unexpectedly low 
(underestimated) percentage of 5hmC within the gene 
bodies was observed in this study (40). This shortcoming 
could most likely be compensated by increasing the 
sequencing depth, although this would increase the 
cost even more, another limitation for routine usage of 
these methods in an average laboratory at the current 
stage. Conversely, conventional affinity-based 5hmC 
mapping (hMeDIP) is not only a cost-effective way to 
quickly project the overall landscape of 5hmC along the 
genome, but it also allows amplification of high frequency 
but weak 5hmC signals within gene bodies. The downside 
of this measurement is that it determines 5hmC distribu- 
tion at low resolution (counting 200-400 bp as one unit). It 
may also potentially over-amplify the weak signals of 
those clustering individual 5hmC sites with low abun- 
dances, adding up multiple weak 5hmC signals to make 
it appear like one strong signal within 200^100 bp window. 
Therefore, it may introduce a potential overestimation of 
the abundance of 5hmC in the gene bodies, outweighing 
the low frequency, but high abundance, of 5hmC at 
distal-regulatory elements such as the enhancers. Clearly, 
both single-base resolution 5hmC-seq and hMeDIP-seq 
approaches have their advantages and disadvantages. 

The comparative hMeDIP method we describe in this 
article retained all of the advantages from the conventional 
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hMeDIP-seq, while minimizing the aforementioned short- 
comings. The unique features of the comparative 
hMeDIP-seq method, particularly the newly developed 
algorithm for DHMR identification, systematically elimin- 
ate the conventional hMeDIP-seq-induced bias or potential 
overestimation of 5hmC abundance for the low-5hmC 
sample when the 5hmC abundance was compared among 
multiple biological samples. If the research aim is to 
identify DHMRs among samples, the optimized compara- 
tive hMeDIP-seq method can fulfill this aim with much 
lower cost and less experimental time. With no doubt, if 
the research goal is to map the absolute 5hmC distribution 
at base-pair resolution, oxBS- or TAB-seq is the method to 
choose. However, to precisely locate 5hmC sites with low 
abundance, oxBS- or TAB-seq would require much larger 
sequencing coverage than hMeDIP-seq, thus increasing the 
cost and amount of labour required. In the future, it would 
be a good strategy to combine the advantages of the com- 
parative hMeDIP-seq with the oxBS- or TAB-seq 
approaches for studying the dynamic changes and precise 
role of 5hmC in genome regulation during cell differenti- 
ation and development. For example, we may use the com- 
parative hMeDIP-seq to locate the genes or genomic 
regions with most dramatic changes of 5hmC (DHMRs). 
Then we can use targeted oxBS- or TAB-seq approaches to 
sequence these targeted genomic fragments (located by the 
comparative hMeDIP-seq approach) to map the precise 
changes of 5hmC within the region at single base reso- 
lution. By combining these two approaches, we envision 
that we could quantitatively compare the localized 5hmC 
changes between the different biological samples in a 
cost-effective user-friendly manner while yielding better 
resolution. 

Upon neural differentiation, ESCs lose pluripotency and 
gain the characteristics of neural lineage cells (44,45). Cell 
differentiation involves drastic epigenomic reprogramming, 
including the establishment of cell-specific DNA 
methylome and hydroxymethylome (11,46). The DNA 
methylomes of ESCs and NPCs have been well established 
by MeDIP-seq and Bisulfite-seq in the past few years 
(47,48). Although the DNA hydroxymethylome of ESCs 
has been established recently by affinity-based approach 
or new sequencing methods at single base resolution, the 
DNA hydroxymethylome of NPCs remained unknown. In 
this study, using the newly developed comparative 
hMeDIP-seq method, we revealed a dramatic loss of 
5hmC during neural differentiation of mouse ESCs and 
provided the first genome-wide comparative 5hmC map 
of NPCs in conjunction with that of the ESCs. We found 
that the 5hmC distribution pattern of NPCs is vastly dif- 
ferent from that of ESCs: the majority of 5hmC peaks that 
exist in ESCs are lost in NPCs, whereas a small fraction of 
gene loci undergo de novo DNA hydroxymethylation. 
Many regions that undergo de novo DNA hydroxy- 
methylation are located at genes associated with mature 
neuronal functions. We speculate that these genes may be 
repressed by 5hmC or its associated cellular complex (es) in 
NPCs, but are poised for DNA demethylation, and as a 
result can undergo gene activation upon later stages of 
neural differentiation and maturation. 



In agreement with the previous findings, the compara- 
tive hMeDIP-seq method revealed a complex correlation 
between 5hmC distribution and gene activity in both ESC 
and NPC. We found that the 5hmC levels at promoters 
(TSS ± 1 kb) had a negative correlation with gene expres- 
sion in both ESCs and NPCs, whereas the correlation 
between 5hmC levels in gene body regions and gene ex- 
pression varied depending on the cell type. 'Loss of 5hmC 
happens at down-regulated, up-regulated and unchanged 
genes during neural differentiation; however, there was a 
weak, but significant, positive correlation between the 
changes in 5hmC and gene expression. These results 
suggest that 5hmC patterns are likely another layer of 
largely uncharacterized epigenetic regulation within the 
larger landscape of cell-specific chromatin structure. It is 
possible that 5hmC in the gene body may not be an essen- 
tial and driving force for gene transcription, but rather, 
gene transcriptional activity at a steady state may affect 
5hmC generation by regulating the access of TET proteins 
towards their DNA substrates. Finally, it has been 
proposed that DNA methylation in gene body regions 
may inhibit cryptic transcription and correlate with gene 
expression (49). DNA methylation in gene body regions 
has also been reported to regulate alternative promoters 
and splicing (50,51). In the future, it will be of great 
interest to investigate whether DNA hydroxymethylation 
in gene body regions also affects alternative promoters 
and mRNA splicing. 

In summary, future investigation is required to dissect 
the intrinsic and intricate relationship between dynamic 
changes of 5hmC and specific gene expression within the 
same cell type or during cell differentiation. Nonetheless, 
our data demonstrate that the comparative hMeDIP-seq 
method is a powerful approach for genome-wide compari- 
sons of DNA hydroxymethylation across multiple 
samples. Using this method, we have uncovered the dif- 
ference in DNA hydroxymethylation between ESCs and 
NPCs. Notably, the strategies described here can also be 
applied to the genome-wide quantitative comparison of 
other DNA modifications (such as 5mC, 5caC and 5fC) 
across different biological or pathological samples. 

SUPPLEMENTARY DATA 

Supplementary Data are available at NAR Online: 
Supplementary Tables 1-5 and Supplementary Figures 
1-4. 
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